Deep Learning in MRI-guided Radiation Therapy: A Systematic Review

MRI-guided radiation therapy (MRgRT) offers a precise and adaptive approach to treatment planning. Deep learning applications which augment the capabilities of MRgRT are systematically reviewed. MRI-guided radiation therapy offers a precise, adaptive approach to treatment planning. Deep learning applications which augment the capabilities of MRgRT are systematically reviewed with emphasis placed on underlying methods. Studies are further categorized into the areas of segmentation, synthesis, radiomics, and real time MRI. Finally, clinical implications, current challenges, and future directions are discussed.


INTRODUCTION
Recent innovations in magnetic resonance imaging (MRI) and deep learning are complementary and hold great promise for improving patient outcomes. With the advent of the Magnetic Resonance Imaging Guided Linear Accelerator (MRI-LINAC) and MR-guided radiation therapy (MRgRT), MRI allows for accurate and real-time delineation of tumors and organs at risk (OARs) that may not be visible with traditional CT based plans. [31] Deep learning methods augment the capabilities of MRI by reducing acquisition times, generating electron density information crucial to treatment planning, and increasing spatial resolution, contrast, and image quality. In addition, MRI auto-segmentation and dose calculation methods greatly reduce the required human effort on tedious treatment planning tasks, enabling physicians to further optimize treatment outcomes. Finally, deep learning methods offer a powerful tool in predicting the risk of tumor recurrence and adverse effects. These advancements in MRI and deep learning usher in the era of fully adaptive radiation therapy (ART) and the MRI-only workflow. [176] Deep learning methods represent a broad class of neural networks which derive abstract context through millions of sequential connections. While applicable to any imaging modality, these algorithms are especially well suited to MRI due to its high information density. [107] Deep learning demonstrates state of the art performance over traditional hand-crafted and machine learning methods but are computationally intensive and require large datasets. For MRI and other imaging tasks, convolutional neural networks (CNNs), built on local context, have traditionally dominated the field. However, advancements in network architecture, availability of more powerful computers, large high-quality datasets, and increased academic interest have led to rapid innovation. Especially exciting are the rapid adaptation of cutting edge recurrent, attention, and self-attention methods which continue to improve upon and even replace CNNs.
Deep learning techniques can be organized according to their applications in MRgRT in the following groups: segmentation, synthesis, radiomics (classification), and real-time/4D MRI. Segmentation methods automatically delineate tumors, organs at risk (OARs), and other structures. However, deep learning approaches face challenges when adapting to small tumors, multiple organs, low contrast, and differing ground truth contour quality and style. These challenges differ greatly depending on the region of the body, so segmentation methods are primarily organized by anatomical region. [134] Synthesis methods are best understood by their input and output modalities. Going from MRI to CT, synthetic CT (sCT) provides accurate attenuation information not apparent in MRI, augmenting the information of co-registered CT images. In an MRI-only workflow, sCT avoids registration errors and the radiation exposure associated with traditional CT. [68] In addition, synthetic relative proton stopping power (sRPSP) maps can be generated to directly obtain dosimetric information for proton radiation therapy. [239] The dosimetric uncertainty can be further enhanced with deep learning dose calculation methods which greatly reduce inference time and could yield lower dosimetric uncertainties compared to traditional Monte Carlo (MC) methods. Synthetic MRI (sMRI) generated from CT, is appealing by combining the speed and dosimetric information of CT with MRI's high soft tissue contrast. However, CT's lower soft tissue contrast makes this application much more challenging, but sMRI has still found success in improving CT-based segmentation accuracy. [49,122,132] Alternatively, there are rich intramodal applications by generating one MRI sequence from another. For example, the spatial resolution of clinical MRI can be increased by predicting a higher resolution image [33,272] and applying contrast can be avoided with synthetic contrast MRI. [103] Radiomics represents an eclectic body of works but can be divided into studies which classify structures in an MRI image [254] or prognostic models which use MR images to predict treatment outcomes such as tumor recurrence or adverse effects. [119,273] Deep learning methods in real-time and 4D MRI overcome MRI's long acquisition time and the low field strengths of the MRI-LINAC by reconstructing images from undersampled k-space [212], synthesizing additional MRI slices [77], and exploiting periodic motion to improve image quality [63].
In this review, we systematically examine studies that apply deep learning to MRgRT, categorizing them based on their application and highlighting interesting or important contributions. We also discuss future trends in deep learning and MRgRT.

LITERATURE SEARCH
This systematic review surveys literature which implements deep learning methods and MRI for radiation therapy research. "Deep learning" is defined to be any method which includes a neural network directly or indirectly. These include machine learning models and other hybrid architectures which take deep learning derived features as input. Studies including MRI as at least part of the dataset are included. Studies must list their purpose as being for radiation therapy and include patients with tumors. Studies on immunotherapy and chemotherapy without radiation therapy are excluded. Conference abstracts and proceedings are excluded due to an absence of strict peer review.
The literature search was performed on Pubmed on December 31, 2022, with the following search criteria in the title or abstract: "deep learning and (MRI or MR) and radiation therapy". This search yielded 335 results. Of these results, 197 were included based on manual screening using the aforementioned criteria. 78 were classified as segmentation, 81 as synthesis, 24 as radiomics (classification), and 14 as real-time or 4D MRI. There is inevitably some overlap in these categories. In particular, studies which use sMRI for the purposes of segmentation are classified as synthesis and papers which deal with real-  Figure 1 shows the papers sorted by category and year. Compared to other review papers, this review paper is more comprehensive in its literature search and is the first specifically on the topic of deep learning in MRgRT. In addition, this work uniquely focuses on the underlying deep learning methods as opposed to their results. Figure 2 shows technical trends in deep learning methods implementing 3D convolution, attention, recurrent, and GAN techniques.

IMAGE SEGMENTATION
Contouring (segmentation) in MRgRT is the task of delineating targets of interest on MR images which can be broadly divided into distinct categories: contouring of organs at risk and other anatomical structures expected to receive radiation dose and contouring of individual tumors. Contouring is typically performed by dosimetrists, physicists, and physicians. Both tumor and multi-organ segmentation suffer from intra-and inter-observer variability. [58] MRI does not capture the true extent of the tumor volume, as well as poorly defined boundaries and similar structures like calcifications lead to institutional and intra-observer variability. Physician contouring conventions and styles further complicate the segmentation task and lead to inter-observer variability. [8,10] Multi-organ segmentation is mostly challenged by the large number of axial slices and OARs which make the task tedious and prone to error. Automated solutions to MRI segmentation have been proposed to reduce physician-workload and provide expert-like performance.
Since the application of CNNs to MRI-based segmentation in 2017 [155], fully convolutional networks (FCNs) have outperformed competing atlas-based and hand-crafted auto-segmentation methods, often matching the intra-observer variability among physicians [19]. FCNs employ convolutional layers which are trained to detect patterns in either nearby voxels or feature maps output from previous convolutional layers. In contrast with traditional CNNs, FCNs forgo densely connected layers. This design choice enables voxel-wise segmentation, allows for variable sized images, and reduces model complexity and training time. Different types of convolutions include atrous and separable convolutions. Atrous convolutions sample more sparsely to gain a wider field of view and can be mix-and-matched to capture large and small features in the same layer. Separable convolutions divide a 2D convolution into two 1D convolutions to use fewer parameters for similar results. By connecting multiple convolutional layers together with non-linear activation functions, larger and more abstract regions of the input image are analyzed to form the encoder. For pixelwise segmentation, the final feature map is expanded to the original image resolution through a corresponding series of transposed convolutional layers form- ing the decoder. All FCNs include pooling layers to conserve computational resources whereby the resolution of feature maps is reduced by choosing the largest (max-pooling) or average local pixel. [3] To evaluate performance, various evaluation metrics are employed with the Dice similarity coefficient (DSC) being the most prevalent. The DSC is defined in equation 1 (Eq 1) as the overlap between the ground truth physician contours and the predicted algorithmic volumes with a value of 0 corresponding to no overlap and 1 corresponding to complete overlap. Mathematically, it is defined as follows where VOLGT is the ground truth volume and VOLPT is the predicted volume [146]: Additional metrics include the Hausdorff distance [146] which measures the farthest distance between two points of the ground truth and algorithmic volumes, volume difference [15], which is simply the difference in volumes, and the Jaccard Index [52], which is similar to the DSC and measures the overlap between VOLPT and VOLGT relative to their combined volumes. A discussion of these metrics is found in Müller et al. [177] However, performance between datasets must be evaluated with caution due to high inter-observer variation between physicians and dataset quality.
The properties of MRI datasets have driven innovation. Multiple MRI sequences, with and without contrast, are often available. To capture all data, the different sequences are co-registered and input as multiple channels yielding multiple segmentations. These segmentations are combined to produce a final segmentation using an average, weighted average, or more advanced method. To account for MRI's high through-plane resolution relative to its in-plane resolution, 3D convolutional layers are often utilized to capture features not apparent in 2D convolution. However, 3D convolutions are computationally expensive, so numerous 2.5D architectures have been proposed. [93,241,257] In a 2.5D architecture, adjacent MRI slices are input as channels, and 2D convolutions are performed. It is also common to see new papers forgo the 3D convolution to save resources for new computationally intense methods. An unfortunate fact is that high-quality MRI datasets are often small. To remedy this, data augmentation methods such as rotating and flipping the MR images are ubiquitous. In addition, the generation of synthetic images to increase dataset size and generalizability is an exciting field of research. [11] Public datasets and competitions have also helped in this regard. For example, the Brain Tumor Segmentation Challenge (BraTS) dataset [166], updated since 2012, has been a primary contributor to brain segmentation progress, spawning the popular DeepMedic framework [112]. Another approach for small datasets is transfer learning. In transfer learning, a model is trained on a large dataset, and then retrained on a smaller dataset with the idea that many of the previously found features are transferable. [275] Advances from the field of natural language processing (NLP) have had a tremendous impact on segmentation tasks. Recurrent neural networks (RNNs) are defined by the output of their node being connected to the input of their node. To avoid an infinite loop, the output is only allowed to connect to its input a set number of times. This property allows for increased context and the ability to handle sequential data which is especially important in language translation. Applied to CNNs, each recurrent convolutional layer (convolution + activation function) is preformed multiple times which creates a wider field of view and more context with each subsequent convolution. However, recurrent layers can suffer from a vanishing gradient problem. Long short-term memory blocks (LSTM) solve this by adding a "forget" gate which forgets irrelevant information. In addition, LSTMs are more capable of making long range connections. Similar to the LSTM gate, the gated recurrent unit (GRU) has an update and reset gate which decide which information to pass on and which to forget. Both LSTM and GRU also have bidirectional versions which pass information forward and backwards. [36,154] Relative performance between the LSTM and GRU gates are situational with the GRU gate being less computationally expensive. [252] A major issue faced in MRI-segmentation can be characterized as "the small tumor problem". Small structures like tumors or brachytherapy fiducial markers represent a small fraction of the total MRI volume, where CNNs can struggle to find them or be confused by noise. Further exacerbating the problem is that applying a deep CNN to whole MR images consumes extensive computational resources, so the MRI must be downsampled. In this case, the downsamplying is very likely to cause small tumors to be missed entirely. One of the simplest ways to improve performance is to alter the loss function. Standard loss functions are cross-entropy and dice loss which seek to maximize voxel wise classification accuracy and overlap between the predicted and ground truth contours, respectively. These can be modified to achieve higher sensitivity to small structures at the expense of accuracy. Focal loss is the cross-entropy loss modified for increased sensitivity [149] and Tversky loss does the same for the dice loss. [202] In addition, borders of the contours are the most important part of the segmentation, so boundary loss functions seek to improve model performance by placing increased emphasis on regions near the contour edge. [85,224] Another approach to solve the problem, albeit at the expense of long-range context, is with two stage networks. In the first stage, regions of interest (ROIs) are identified, and target structures are then contoured in the ROIs in the second stage. Notable efforts include Mask R-CNN [83] and Retina U-Net [97] which implement convolution-based ROI sub-networks with advanced correction algorithms. Seqseg instead replaces the correction algorithms with a reinforcement learning based model. [224] An agent is guided by a reward function to iteratively improve the conformity of the bounding box. Seqseg reported comparable performance with higher bounding box recall and intersection over union (IoU) compared to Mask R-CNN.
Related developments from NLP are the concepts of attention and the transformer. In terms of MRI, attention is the idea that certain regions of the MRI volume are more important to the segmentation task and should have more resources allocated to them. ROI schemes can then be defined as a form of hard attention by only considering the region around a tumor. A version of soft attention would weight the region around the tumor heavily and process the information in high resolution but also give a smaller weighting to nearby organs and process it in lower resolution. [214] In practice, attention modules include a fully connected feedforward neural network to generate weights between a feature map of the encoder and a shallower feature map in the decoder. These weights are improved upon through backpropagation of the entire network to give higher representational power to contextually significant areas of the image. This fully connected network can also be replaced with other models such as the RNN, GRU, or LSTM. [269] If the same feature map is compared with itself, this is called self-attention and is the basis for the transformer architecture. [233] The transformer can be thought of as a global generalization of the convolution and can even replace convolutional layers. The advantages of the transformer are explicit long-range context and the transformer's multi-head attention block allows for attention to be focused on different structures in parallel. However, transformers require more data to train and can be very computationally expensive. Such computational complexity can be remedied by including convolutional layers in hybrid CNN-transformer architectures [137], by making long range connections between voxels sparse, [30] or by implementing more efficient self-attention models like FlashAttention. [42] From the field of neuroscience, deep spiking neural networks (DSNNs) attempt to more closely model biological neurons by connecting neurons with asynchronous time dependent spikes instead of the continuous connections between neurons of traditional neural networks. Potential advantages include lower power use, real-time unsupervised learning, and new learning methods. However, these advantages are only fully realized with special neuromorphic hardware, are difficult to train, and currently lag conventional approaches. For these reasons, they are currently only represented by one paper in this review. [1] Many new models for MRI segmentation have been created by modifying U-Net. U-Net derives its name from its shape which features convolutional layers in the encoder and transposed convolutional layers in the decoder. Its main innovation, however, is its long-range skip connections between the encoder and decoder. Dense U-Net densely connects convolutional layers in blocks [18], ResU-Net includes residual connections [45], Retina U-Net is a two-stage network, RU-Net includes recurrent connections, R2U-Net adds residual recurrent connections [2]. Attention modules have also been added at the skip connections. [185,262] The aforementioned networks were all designed with 2D convolutions but can be modified to include 3D convolutions. Both V-Net [170] and nnUNet [94] were designed with 3D convolutional layers with nnUNet additionally automating preprocessing and learning parameter optimization. Pix2pix uses U-Net as the generator with a convolutional discriminator (PatchGAN) [95].
Other state-of-the-art architectures include Mask R-CNN, DeepMedic, and DeepLabV3+. [25] Mask R-CNN is a two-stage network with a ResNet backbone. Mask Scoring RCNN (MS-RCNN) improves upon Mask R-CNN by adding a module which penalizes ROIs with high classification accuracy but low segmentation performance [91]. DeepMedic, designed for brain tumor segmentation, is an encoder-only CNN which inputs a ROI and features two independent row-resolution and normal resolution channels. These channels are joined in a fully connected convolutional layer to predict the final segmentation. The convolutions in the encoder-only style reduce the final segmentation map dimensions compared to the original ROI (25x25x25 vs 9x9x9 voxels). DeepLabV3+ leverages residual connections and multiple separable atrous convolutions. Xception improves upon the separable convolution by reversing the order of the convolutions and including ReLU blocks after each operation for non-linearity. [32] 3.1 Brain Largely unaffected by patient motion and comprised of detailed soft tissue structures, the brain is an ideal site to benchmark segmentation performance for MRI and represents the dominant category in MRI segmentation research. Unique to brain MRI preprocessing is skull stripping, where the skull and other non-brain tissue are removed from the image. This can significantly improve results, especially for networks with limited training data. [111] Shown in table 1, the majority of the studies focus on segmenting different brain tumors such as glioma, Glioblastoma Multiforme (GBM), and metastases. A small minority of studies focuses on OARs like the hippocampus. Advancements in brain segmentation have come, in large part, from the yearly Multimodal Brain Tumor Image Segmentation Benchmark (BraTS) challenge, which includes high quality T1-weighted (T1W), T2-weighted (T2W), T1-contrast (T1C), and T2 -Fluid-Attenuated Inversion Recovery (FLAIR) sequences with the purpose of segmenting the whole tumor (WT), tumor core (TC), and enhancing tumor (ET) volumes. The WT is defined as the entire spread of the tumor visible on MRI; The ET is the inner core which shows significant contrast compared to healthy brain tissue, and the TC is the entire core including low contrast tissue. The most popular architectures are DeepMedic, created for the BraTS challenge, and U-Net.
Notable efforts in the BraTS challenge include Momin et al achieving an exceptional WT dice score of .97 ±.03 with a Retina U-Net based model and mutual enhancement strategy. In their model, Retina U-Net finds a ROI and segments the tumor. This feature map is fed into the classification localization map (CLM module) which further classifies the tumor into subregions. The CLM shares the encoding path with a segmentation module, so classification and segmentation share information and are improved iteratively. [173] Huang et al focuses on correctly segmenting small tumors. Based on DeepMedic, the method incorporates a prior scan and custom loss function, the volume-level sensitivity-specificity (VSS), which rates and significantly improves the metastasis sensitivity and specificity to segment small brain metastases. [89] Another paper improves small tumor detection by 2.5 times compared to the standard dice loss by assigning a higher weight to small tumors. [24] Lee et al takes the novel approach of using standard dice loss for the first 40 epochs and changes to Tversky loss for the final 20 epochs to specify sensitivity and specificity. [130] Both Tian et al [228] and Ghaffari et al [72] utilize transfer learning datasets to cope with limited data. Pan et al includes a two-stage U-Net model with residual and attention blocks. [193] Ahmadi et al achieves competitive results in the BraTS challenge with a DSNN. [1]

Head and Neck
The head and neck (HN) region contains many small structures, making high-resolution and highcontrast imaging of great importance. MRI is especially preferred over CT imaging for patients with amalgam dental fillings due to the metallic content that can cause intense streaking artifacts on CT. [46] In addition, MRI is the standard of care for nasopharyngeal carcinoma (NPC), leading to significant research attention on auto-segmentation algorithms for HN MR images. Other research efforts include segmentation of oropharyngeal cancer, glands, and lymph nodes in the American Association of Physicists in Medicine (AAPM)'s RT-MAC challenge [20], as well as multi-organ segmentation.
Notable efforts include the two-stage multi-channel Seqseg architecture for NPC segmentation. [147] Seqseg uses reinforcement learning to refine the position of the bounding box, implements residual blocks, recurrent channel and region-wise attention, and a custom loss function that emphasizes segmentation of the edges of the tumor. Outierial et al [200] improves the dice score by 0.10 with a  [43] concludes that the union output from T1W and T2W sequences has similar performance to T1C MRI, suggesting that contrast may not be necessary for NPC segmentation. Similarly, Wahid et al [234] found that T1W and T2W sequences significantly improve performance, but dynamic contrast enhanced MRI (DCE) and diffusion weighted imaging (DWI) have little effect. Interesting approaches to gland and lymph node segmentation came out of the AAPM's RT-MAC challenge, with Kawahara et al's [115] 2.5D GAN and Korte et al [127] employing a 2-stage architecture. The first stage segments the OARs in low resolution to create a bounding box, followed by U-Net segmenting the ROI in high resolution. Jiang et al segments the parotid glands using T2W MRI and unpaired CT images with ground truth contours. First, sMRI is generated from the CT volumes using a GAN. In the second step, U-Net generates probabilistic segmentation maps for both the sMRI and MRI based on the CT ground truth contours. These maps, along with sMRI and MRI data, are then input into the organ attention discriminator, which is designed to learn finer details during training, ultimately producing the final segmentations. [104] 3.3 Abdomen, Heart, And Lung In contrast to the brain, the abdomen is susceptible to respiratory and digestive motion of the patient often leading to poorly defined boundaries. While motion management techniques like patient breathhold and not eating or drinking before treatment can mitigate these effects, the long acquisition time of MRI will inevitably lead to errors. Often physicians must rely on anatomical knowledge to deduce the boundaries of OARs. This makes segmentation challenging for CNN-based architectures which build from local context. In addition, registration errors make including multiple sequences impractical. OARs segmented in the abdomen include the liver, kidneys, stomach, bowel, and duodenum. The liver and kidneys are not associated with digestion and are relatively stable while the stomach, bowel, and duodenum are considered unstable. The duodenum is the most difficult for segmentation algorithms due to its small size, low contrast, and variability in shape. In addition, radiation induced duodenal toxicity is often dose-limiting in dose escalation studies making accurate segmentation of high importance. [120] Similar problems occur in the heart and lung because of their periodic motion with the lung being particularly challenging since it is filled with low-signal air. However, MR segmentation of cardiac subregions have shown growing interest as these are not visible on CT and have different tolerances to radiation. [219] The results are summarized in Table 3. Due to the large number of organs segmented in several of these studies, only the stomach and duodenum dice scores are reported to establish how the algorithms handle unstable organs. Zhang et al [161] generates a composite image from the current slice, prior slice, and contour map to pre-dict the current segmentation with U-Net. Luximon et al [161] takes a similar approach by having a phy-sician contour every 8th slice. These contours are then linearly interpolated and improved upon with a 2D Dense U-Net. The remaining studies do not require previous information and struggle to segment the duodenum. Ding et al [47] improves upon a physician-defined acceptable contour rate by up to 39% with an active contour model. A 3D Dense U-Net with sequential refinement networks is included in Fu et al [66]. Morris et al segments heart substructures with a 2 channel 3D U-Net. [175] Wang et al segments lung tumors with high accuracy relying on segmentation maps from previous weeks with the aim of adaptive radiation therapy (ART). [237] An addition study by the same group feeds the features from the CNN into a GRU based RNN to predict tumor position over the next 3 weeks. Attention is included to weigh the importance of the prior weeks' segmentation maps. [236] 3

.4 Pelvis
The anatomy of the pelvis allows both external beam radiation therapy (EBRT) and brachytherapy approaches for radiation therapy. Therefore, MRI segmentation studies have proposed methods to contour fiducial markers and catheters for cervical and prostate therapy, as well as tumors and OARs. However, a current challenge is that fiducials and catheters are designed for CT and are not optimal for MRI segmentation. For example, in prostate EBRT, gold fiducial markers localize the prostate with high contrast and correct for motion. However, metal does not emit a strong signal on MRI, so fiducials on MRI are characterized by an absence of signal, which can be confused with calcifications. Despite this, MRI is enabling treatments with higher tumor conformality. For instance, the gross tumor volume (GTV) of prostate cancer is not well delineated on CT but is often visible on MRI. In addition, the prostate apex is significantly clearer on MRI. [235] MRI-based focal boost radiation therapy, in addition to a single dose level to the whole prostate, escalates additional dose to the GTV to reduce tumor recurrence.
[121] Table 4 shows relevant auto-segmentation techniques applied to the pelvic region. Shaaer et al [208] segments catheters with a T1W and T2W MRI-based U-Net model and takes advantage of catheter continuity to refine the contours in post processing. Zabihollahy et al [260] creates an uncertainty map of cervical tumors by retraining the U-Net model with a randomly set dropout layer. This technique is called Monte Carlo Dropout (MCDO). Cao et al [19] takes pre-implant MRI and post-implant CT as input channels to their network. After preforming intra-observer variability analysis, they achieve performance more similar to a specialist radiation oncologist for cervical tumors in brachytherapy than a non-specialist. Eidex et al [52] segments dominant intraprostatic lesions (DILs) and the prostate for focal boost radiation therapy with a Mask R-CNN based architecture. Sensitivity is found to be an important factor in evaluating model performance because weak models can appear strong by missing difficult lesions entirely. Figure 3 shows an example of automatic contours of the prostate and DIL on T2w MRI which would not be visible on CT. STRAINet [180] realizes exceptional performance by utilizing a GAN with stochastic residual and atrous convolutions. In contrast with standard residual connections, each element of the input feature map which does not undergo convolution has a 1% chance of being set to zero. Singhrao et al [215] implements a pix2pix architecture for fiducial detection achieving 96% detection with the misses caused by calcifications.

IMAGE SYNTHESIS
Synthesis is an exciting field of research, defined as translating one imaging modality into another. Benefits of synthesis include avoiding potential artifacts, reducing patient cost and discomfort, and avoiding radiation exposure. [243] In addition, utilizing multiple modalities introduces registration errors which can be avoided with synthetic images. Current methods in MRgRT include synthesis of sCT from MRI, sMRI from CT, and relative proton stopping power images from MRI. Other areas of synthesis research include creating higher resolution MRI (super-resolution) and predicting organ displacement based on periodic motion in 4D MRI. Segmentation can also be thought of as a special case of synthesis because the input MRI is translated into voxel-wise masks which assume discrete values according to their class. The distinction between synthesis and segmentation is particularly muddied when the segmentation ground truth is from a different imaging modality. [207] Synthesis architectures are fundamentally interchangeable with segmentation architectures but have diverged in practice. For example, U-Net, described in detail in Section 3, is the predominant backbone in both areas. However, synthesis models require that the entire image be translated, so that they do not include two-stage architectures and are dominated by generational adversarial network (GAN)based architectures. The GAN is comprised of a CNN or self-attention-based generator which generates synthetic images. The generator competes with a discriminator which attempts to correctly classify synthetic and real images. As the GAN trains, a loss function is applied to the discriminator when it mislabels the image, whereas a loss function is applied to the generator when the discriminator is correct. The model is ideally considered trained once the discriminator can no longer correctly identify the synthetic images. Conditional GANs (cGANs) expand on the standard GAN by also inputting a vector with random values or additional information into both the generator and discriminator. [171] In the case of MRI, the values of the vector can correspond to the MRI sequence type and clinical data to account for differences in patient population and setup. The CycleGAN adds an additional discriminator and generator loop. [274] For example, an MRI would be translated into a sCT. The sCT would then be translated into a sMRI. Since the input is ultimately tested against itself, this allows for training with unpaired data. The need for co-registration is eliminated but requires significantly more data to achieve comparable results with paired training.
Despite their success, GANs can be unstable during training and struggle in difficult synthesis problems. One way to improve its performance is with the Wasserstein GAN (WGAN) [6]. Instead of the discriminator classifying the images as real or fake, the WGAN measures the probability distributions of the real and fake images and finds the distance between them in the form of the Wasserstein distance.
The discriminator attempts to maximize this distance while the generator attempts to minimize it. The WGAN approach often improves stability and performance. Although not limited to WGANs, spectral normalization is often included which constricts the training weights of the discriminator such that the gradient cannot explode. Another approach, claiming better performance than the WGAN, is the relativistic GAN (RGAN) [108]. The RGAN claims that the generator should, in addition to increasing the probability that synthetic images appear realistic, increase the probability that real images appear fake to the discriminator. Without this condition, the discriminator will conclude that every image it comes across is real in the late stages of training with a well-trained generator. This goes against the priori knowledge that half of the images are fake. A standard GAN can be converted to a RGAN by modifying its loss function.

MRI-Based Synthetic CT
MRI-based sCT is the most extensively researched and influential application of synthesis models in radiation therapy. While MR images provide excellent soft tissue contrast, they do not contain the necessary attenuation information for dose calculation that is embedded in CT images. Owing to this limitation, CT has traditionally been the workhorse for treatment planning while MRI has been relegated to diagnostic applications. However, CT suffers from lower soft tissue contrast and imparts a non-negligible radiation dose, especially for patients receiving standard fractionated image guided radiation therapy (IGRT). In addition, metallic materials found in dental work and implants can lead to severe artifacts in CT, reducing the quality of the treatment plan. By augmenting CT with sCT, these problems can be avoided. Furthermore, according to the "As Low As Reasonably Achievable" (ALARA) principle, the replacement of CT with sCT for an MRI only workflow could be justified with its high accuracy, especially in radiosensitive populations like pediatric patients. [184,242] Calculation of dose distribution using MRI-based sCT can be enhanced by replacing traditional Monte Carlo simulation (MC) techniques with deep learning. MC accurately predicts the dose distribution based on physical principles, including the electron return effect (ERE), which adds additional dose to boundaries with different proton densities in the presence of a magnetic field. However, the technique can be extremely slow, as it relies on randomly generating paths of tens of thousands of particles. The higher number of particles reduces dosimetric uncertainty. This problem is particularly noticeable in proton therapy, where MC or pencil beam algorithm (PBA) calculations can take several minutes on a CPU, and it can take hours to optimize a single treatment plan. [192] As a result, compromises must be made in clinical practice between dosimetric uncertainty, MC run time, and treatment plan optimization. Deep learning methods show exceptional potential to improve upon MC dose calculation models. Once trained, deep learning algorithms take only a few seconds to synthesize a dose distribution. In addition, they can be trained on extremely high accuracy MC generated dose distributions that would be impractical in everyday clinical practice.
The primary challenge to sCT methods is the accurate reconstruction of bone and air, due to their low proton density and weak signal. This can make it difficult for sCT to distinguish between the two, leading to large errors. In addition, further complicating the issue is that bone makes up a small fraction of the patient volume in radiation therapy tasks or applications which is similar to the "small tumor problem" seen in segmentation. Other issues that can arise include small training sets, misalignment between CT and MRI, and causes of high imaging variability such as intestinal gas.
To evaluate sCT performance, various metrics are used to compare voxel values between the ground truth CT and sCT. The most common metric is the mean absolute error (MAE) [140,276] which is reported in tables 5 and 6 if available. The MAE is defined below in Eq 2, where xi and yi are the corresponding voxel values of the CT and sCT, respectively, and n is the number of voxels.
The MAE is typically reported in Hounsfield units (HU) but can also be dimensionless if reported with normalized units. Other common metrics in literature are the mean error [129], which forgoes the absolute value in MAE, the mean squared error (MSE) [258], which substitutes absolute value for the square, and the Structural Similarity Index (SSIM), which varies from -1 to 1 where -1 represents extremely dissimilar images and 1 reperesents identical images. [204] A full discussion of these metrics can be found in Necasova et al. [179] Since sCT is primarily intended for treatment planning, dosimetric quantities which measure the deviation between CT-and sCT-derived plans are often reported. One of the most common metrics is gamma analysis. Repurposed as a metric to compare treatment plan dose to actual dose on LINACs, gamma analysis looks at each point on the dose distribution and evaluates if the acceptance criteria are met. The American Association of Physicists in Medicine (AAPM) Task Group 119 recommends a low dose threshold of 10%, meaning that points which receive less than 10% of the maximum dose are excluded from the calculation. Other metrics include the mean dose difference and the minimum dose delivered to 95% of the clinical treatment volume (D95) difference.
Sampling notable MRI-based sCT works for photon radiation therapy, several take advantage of cGANs to include additional information. Liu et al improves upon the CycleGAN by including a dense block, which captures structural and textural information and better handles local mismatc¬hes between MRI and ground truth CT images. In addition, a compound loss function with adversarial and distance losses improves boundary sharpness. An example patient is shown in Figure 4. [  variant spaces between high-and low-resolution Dixon MRI with the Huber distance. [197] In addition, separable convolutions are used to reduce parameters, and a relativistic loss function is applied to improve training stability. Finally, Zhao et al represents the first MRI-based sCT paper to implement a hybrid transformer-CNN architecture outperforming other state-of-the-art methods. Their method implements a conditional GAN. The generator consists of CNN blocks in the shallow layers to capture local context and save computational resources, while transformers are used in deeper layers to provide better global context. [267] Generating sCTs from MRI for the purposes of proton therapy is not fundamentally different from the process for photon therapy. However, proton therapy takes advantage of the Bragg peak, which concentrates the radiation in a small region to spare healthy tissue. While this is beneficial, this puts a tighter constraint on sCT errors. Another difference is that sCT images must first be converted to relative proton stopping power maps before they can be used in treatment planning. Therefore, directly generating synthetic proton relative stopping power (sRPSP) maps instead of sCT would be ideal. Boron therapy is a form of targeted radiation therapy in which boronated compounds are delivered to the site of the tumor and irradiated with neutrons. The boron undergoes a fission reaction, releasing alpha particles that kill the tumor cells. However, the targeting mechanism typically relies on targeting cancer cells' high metabolic rate. Epidermal tissue that also has a high metabolic rate uptakes boron, making skin dose an important concern in boron therapy. Therefore, methods for generating sCT images for boron therapy should emphasize accurate reconstruction around the skin. Shown in Table 6, many methods show high dosimetric accuracy for proton therapy. Liu et al develops a conditional cycleGAN to synthesize both high and lower energy CT. Multiple loss functions are also used to accurately classify and recreate the sCT. [152] Wang et al creates the first synthetic relative proton stopping power maps from MRI with a cycleGAN and loss function to take advantage of paired data. Their method achieves an excellent MAE of 42 ± 13 HU, but struggles with dosimetric accuracy. [239] Maspero et al achieves a 2%/2mm gamma pass rate above 99% for proton therapy by averaging predictions from three sep-arate GANs trained on axial, sagittal, and coronal views, respectively. [164] Replacing traditional MC dose calculation methods, Tsekas et al generates VMAT (volumetric modulated arc therapy) dose distributions in static positions with sCT. Additionally, parameters include a mask of the tissue exposed to the beam, the distance from LINAC source, the distance from central beam, and the radiological depth. [232] These parameters are input into a 3D U-Net, significantly increasing processing speed. Finally, SARU, a self-attention Res-UNet, lowers skin dose for boron therapy, achieving better results than the pix2pix method. [268]

CT and CBCT-Based Synthetic MRI
Generating sMRI from CT leverages MRI's high soft tissue contrast for improved segmentation accuracy and pathology detection for CT-only treatment planning. In addition, the ground truth X-ray attenuation information is maintained compared to an MRI-only workflow. Cone beam CT (CBCT) is primarily used for patient positioning before each fraction of radiation therapy. Kilovoltage (kV) and megavoltage (MV) energies are standard in CBCT with kV images providing superior contrast and MV images providing superior tissue penetration. However, noise and artifacts can often reduce CBCT image quality. [217] Generating CBCT-based sMRI can yield higher image quality and soft-tissue contrast while also retaining CBCT's fast acquisition speed. CT and CBCTs' rapid acquisition time can make it preferable over MRI for patients with claustrophobia during the MR simulation or for pediatric patients who would require additional sedation. In addition, MRI is not suitable for patients with metal implants such as pacemakers. However, sMRI is significantly more challenging to generate compared to sCT. This is primarily due to the recovery of soft tissue structures visible only in MRI. For this reason, sMRI is often used to improve segmentation results in CT and CBCT. However, some studies report direct use of sMRI for segmentation. Since MRI intensity is only relative and not in definitive units like CT, MAE is much less meaningful than other metrics. Therefore, peak signal to noise ratio (PSNR) is preferentially reported. [22] For

Intramodal MRI Synthesis and Super Resolution
It can be beneficial to synthesize MRI sequences from other MRI sequences. Intra-modal applications include generating synthetic contrast MRI to prevent the need for injected contrast, super-resolution MRI to improve image quality and reduce acquisition time, and synthetic 7T MRI due to its lack of widespread availability and improve spatial resolution and contrast. [210] To reduce complexity and cost, a potential approach to radiation therapy is to rotate the patient instead of using a gantry. However, the patient's organs deform under gravity, requiring multiple MRIs at different angles for MRgRT. MR images of patients rotated at different angles can better enable gantry free radiation therapy. In this section, synthesis studies which synthesize other MRI sequences are discussed.
Preetha et al synthesizes T1C images with a multi-channel T1W, T2W, and FLAIR MRI sequences using the pix2pix architecture. [103] A cycleGAN with a ResUNet generator is trained to generate lateral and supine MR images for gantry-free radiation therapy. [27] ResUNet is also implemented to generate ADC uncertainty maps from ADC maps for prostate cancer and mesothelioma. [278] Studies designed explicitly for super-resolution include Chun et al and Zhao et al. In the former study, a U-Net based denoising autoencoder is trained to remove noise from clinical MRI. Since there is a limited number of paired low-resolution and high-resolution MR images, a CNN is trained to downsample high resolution data from this dataset. Finally, a GAN utilizing both residual and skip connections synthesizes the high resolution MRI with high accuracy. [33] The same architecture is employed in Kim et al [123] for real-time 3D MRI to increase spatial resolution. In addition, dynamic keyhole imaging is formulated to reduce acquisition time by only sampling central k-space data associated with contrast. The peripheral k-space data associated with edges is added from previously generated super-resolution images in the same position. [123] Zhao et al makes use of super-resolution for brain tumor segmentation, increasing the dice score from 0.724 to 0.786 with 4x super resolution images generated from a GAN architecture. The generator has low-and high-resolution paths and dense blocks. [272] Often in clinical practice, the through place resolution is increased to reduce the MRI scan time. Xie et al achieves near perfect accuracy in recovering 1 mm from 3 mm through plane resolution by training parallel CycleGANs which predict the higher resolution coronal and sagittal slices, respectively. These predictions are then fused to create the final 3D prediction. [248] No studies have published yet to synthesize 7T MRI for radiation therapy.

RADIOMICS (CLASSIFICATION)
Unlike synthesis which maps one imaging modality to another, radiomics extracts imaging data to classify structures or to predict a value. Deep learning applications to MRI-based radiomics often achieve state-of-the-art performance over hand-crafted methods in detection and treatment outcome prediction tasks. Traditional radiomics algorithms apply various hand-crafted matrices based on shape, intensity, texture, and imaging filters to generate features. The majority of these features have no predictive power, and would confuse the model if all were directly implemented. Therefore, an important step is feature reduction which screens out features without statistical significance. Typically, this is done with a regression such as analysis of variance (ANOVA), Least Absolute Shrinkage and Selection Operator (LASSO), or ridge regression. Alternatively, a CNN or other neural network can learn significant features. The advantage of the deep learning approach is that the network can learn any relevant features including handcrafted ones. However, this assumes a large enough dataset which can be problematic for small medical datasets. Hand-crafted features have no such constraint and are easily interpretable. It is often the case that a hybrid approach including both hand-crafted and deep learning features yields the highest performance. Biometric data like tumor grade, patient age, and biomarkers can also be included as features. Once the significant features are found, supervised machine learning algorithms like support vector machines, artificial neural networks, and random forests are employed to make a prediction from these features. Recently, CNNs like Xception and InceptionResNet [221], recurrent neural networks with GRU and LSTM blocks, and transformers have also found favor in this task, as introduced in Section 3. Radiomics can also be done purely with deep learning as it is done with segmentation and synthesis. In this section, we divide the studies into those detecting or classifying objects in the image and studies predicting a value such as the likelihood of distant metastases, treatment response, and adverse effects. While detection is traditionally under the purview of segmentation, the architectures of detection methods and the classification task are in common with other radiomics methods, and so are discussed here.
While radiomics algorithms can excel on local datasets, the main concern for MRI applications is the generalizability of the methods. Variability in MR imaging characteristics such as field strength, scanner manufacturer, pulse sequence, ROI or contour quality, and the feature extraction method can result in different features being significant. This variability can largely be mitigated by normalizing the data to a reference MRI and including data from multiple sources. [35] Classification accuracy is an appealing evaluation metric due to its simplicity, but accuracy can be misleading with unbalanced data. For example, if 90% of tumors in the dataset are malignant, a model can achieve 90% accuracy by labeling every tumor as malignant. Precision [14], the ratio of true positives to all examples labeled as positive by the classifier, and recall [119], the ratio of true positives to all actual positives, will also both differ if given imbalanced data. The F1 score [80] is defined in Eq 3, ranging from 0 to 1 and combining precision and recall to provide a single metric. A high F1 value indicates both high precision and recall and is resilient towards unbalanced data.
The most common evaluation metric resistant to unbalanced data is the area under the curve (AUC) of a receiver operating characteristics (ROC) curve [22,99,182]. In a ROC curve, the x-axis represents the false positive (FP) rate while the y-axis relates the true positive (TP) rate. In addition, the ROC curve can be viewed as a visual representation to help find the best trade-off between sensitivity and specificity for the clinical application by comparing one minus the specificity versus the sensitivity of the model. The AUC value provides a measurement for the overall performance of the model with a value of 0.5 representing random chance and a value of 1 being perfect classification. If the AUC value is below 0.5, the classifier would simply need to invert its predictions to achieve higher accuracy. It is important to note that all these metrics are for binary classification but are commonly used in multi-class classification by comparing a particular class with an amalgamation of every other category. Finally, the concordance index (C-index) measures how well a classifier predicts a sequence of events and is most appropriate for prognostic models which predict the timing of adverse effects, tumor recurrence, or patient survival times. The C-index ranges from 0 to 1 with a value of 1 being perfect prediction. [82,160] A full discussion of evaluation metrics for classification tasks is found in Hossin and Suliaman. [84]

Cancer Detection and Staging
Effectively detecting and classifying tumors is vital for treatment planning. Deep learning detection methods supersede segmentation algorithms when the tumors are difficult to accurately segment or cannot easily be distinguished from other structures. In addition, detection models can further improve segmentation results by eliminating false positives. When applied to MRI, detection studies also have the potential to differentiate between cancer types and tumor stage to potentially avoid unnecessary invasive procedures like biopsy.
The majority of works in detection are for brain lesion classification. Chakrabarty et al attains exceptional results in differentiating between common types of brain tumors with a 3D CNN and outperforms traditional hand-crafted methods. [22] Radiation induced cerebral microbleeds appear as small dark spots in 7T time of flight magnetic resonance angiography (TOF MRA) and can be difficult to distinguish from look-a-like structures. Chen

Treatment Response
The decision to treat with radiation therapy is often definitive. Since radiation dose will unavoidably been delivered to healthy tissue, treatment response and the risk of adverse effects are heavily considered. Further compounding the decision, dose to healthy tissue is cumulative that is complicating any subsequent treatments. In addition, unknown distant metastasis can derail radiation therapy's curative potential. Therefore, predicting treatment response and adverse effects are of high importance, and significant work has gone into applying deep learning algorithms to prognostic models.
Diffusion weighted imaging (DWI) has attracted strong interest in studies which predict the outcome of radiation therapy. DWI measures the diffusion of water through tissue often yielding high contrast for tumors. Cancers can be differentiated by altering DWI's sensitivity to diffusion with the b value, in which higher b values correspond to an increased sensitivity to diffusion. By sampling at multiple b-values, the attenuation of the MR signal can be measured locally in the form of apparent diffusion coefficient (ADC) values. A drawback of DWI is that the spatial resolution is often significantly worse than T1W and T2W imaging. [167] Unlike segmentation and synthesis which require highly accurate structural information, high spatial resolution is not necessary for treatment outcome prediction, so the functional information from DWI is most easily exploited in predictive algorithms.

REAL-TIME AND 4D MRI
Real-time MRI during treatment has recently been made possible in the clinical setting with the creation of the MRI-LINAC. Popular models include the Viewray MRIdian (ViewRay Inc, Oakwood, OH) and the Elekta Unity (Elekta AB, Stockholm). Electron return effect (ERE), which increasing dose at boundaries with differing proton densities such as the skin at an external magnetic field, guides the architecture of these models. [246] At higher field strengths, the ERE becomes more significant, but MR image quality increases. In addition, a higher field strength can reduce the acquisition time for realtime MRI. Therefore, a balance must be struck. Both the Elekta Unity and Viewray Mridian with 1.5T and 0.35T magnetic fields, respectively, compromise by choosing lower field strengths The Elekta Unity prioritizes image quality and real-time tracking capabilities at the expense of a more severe ERE. [195] The MRI-LINAC has enabled an exciting new era of ART wherein anatomical changes and changes to the tumor volume can be accurately discerned and optimized between treatment fractions. In addition, unique to MRgRT, the position of the tumor can be directly monitored during treatment, potentially leading to improved tumor conformality and improved patient outcomes. [190] Periodic respiratory and cardiac motion are common sources of organ deformation and should be accounted for optimal dose delivery to the PTV. Tracking these motions is problematic with conventional MRI since scans regularly take approximately 2 minutes per slice leading to a total typical scan time of 20 to 60 minutes. [51] In addition to motion restriction techniques like patient-breath hold, cine MRI accounts for motion in real-time by reducing acquisition times to 15 seconds or less. This is achieved by only sampling one (2D) or more (3D) slices with short repetition times, increasing slice thickness, and undersampling. In addition, the MR signal is sampled radially in k-space to reduce motion artifacts. Capturing a 3D volume across multiple timesteps of periodic motion is known as 4D MRI. [218] Deep learning methods can further reduce acquisition time by reconstructing intensely undersampled cine MRI slices. In addition to reconstructing from undersampled k-space MRI sequences, several approaches further reduce acquisition time. In the first approach, cine MRI and/or k-space trajectories are used to predict the timestep of a previously taken 4D MRI. However, this method requires a lengthy 4D MRI and does not adapt to changes in the tumor volume over the course of the treatment. Additional approaches include synthesizing a larger volume than cine MRI slice captures to reduce acquisition time, predicting the deformation vector field (DVF) which relays real-time organ deformation information, or determining the 3D iso-probability surfaces of the organ to stochastically determine tumor position if real-time motion adaptation is not possible. Table 11, this category is experiencing rapid growth with majority of papers being published within the current year. Notable works include Gulamhussene et al which predicts a 3D volume from 2D cine MRI or a 4D volume from a sequence of 2D cine MR slices. A simple U-Net, introduced in Section 3, is implemented to reduce inference time. The performance degrades for synthesized slices far away from the input slices but achieves an exceptional target registration error. [77] Nie et al instead uses autoregression and the LSTM time series modeling to predict the diaphragm position and to find the matching 4D MRI volume. Autoregression outperforms an LSTM model which could be attributed to a low number of patients. [181] Patient motion is alternatively predicted in Terpestra et al by using undersampled 3D cine MRI to generate the DVF with a CNN with low target registration error. [226] Similarly, Romaguera et al predicts liver deformation using a residual CNN and prior 2D cine MRI. This prediction is then input into a transformer network to predict the next slice. [201] Driever et al simply segments the stomach with U-Net and constructs iso-probability surfaces centered about the center of mass to isolate respiratory motion. These probability distributions can then be implemented in treatment planning. [50] 7 OVERVIEW AND FUTURE DIRECTIONS Innovations in deep learning and MRI are complementary and growing at a fast pace. Shown in Figure  2, the complexity of deep learning algorithms is rapidly increasing. New systems like the MRI-LINAC have allowed for adaptive radiation therapy and real-time MRI during treatment. Despite the successes of the studies reviewed, there are more challenges to overcome.

Shown in
Many challenges in deep learning applications to MRI are related to limited computational resources. While MRI offers high resolution data which complements deep learning's big data approach, the typ- ical 3D image size is over one gigabyte of data which means that concessions must be made to apply deep learning methods. These include downsampling the original MRI, forgoing 3D convolution, and processing the images in small patches. As field strengths increase to 7 Tesla and beyond, higher resolution images, as well as generating more powerful and expensive models, computational challenges remain ever-present despite Moore's Law. [174] Therefore, the task is to most efficiently utilize available computational resources. One source of innovation is the increasing optimization of hardware for computer vision. The improved hardware yields higher performance and efficiency. For example, computer vision tasks have progressed from the central processing unit (CPU) to the graphics processing unit (GPU) and often to the tensor processing unit (TPU). Neuromorphic hardware has demonstrated exceptionally high efficiency and could have applications in real-time MRgRT. Deep spiking neural networks (DSNNs), which are designed for neuromorphic hardware, can more accurately model the human brain than traditional neural networks, allowing for real-time learning and adaptation. [251] For instance, DSNN's adaptive capabilities could find an application in real-time MRI. Instead of only relying on local context such as in ROI methods and convolution, attention and the transformer allow for direct global context by focusing on relevant regions. Currently, hybrid CNN-transformer architectures are gaining traction by strategically placing transformer layers to improve performance while also keeping the models computationally viable with convolutional layers. [211] In the future, it is foreseeable that pure self-attention models such as the transformer will become state-of-the-art with more powerful hardware and more efficient approaches. This trend towards attention and self-attention models is shown in Figure 2 with a growing interest in attention over the last three years. This could also partially explain the drop in studies using 3D convolution and GAN architectures since more studies are devoting their resources to attention. Another cause for fewer GANs is that nine fewer MRI synthesis studies were written in 2022 in which GANs are the current state-of-the-art method. Finally, diffusion models are an alternative to GANs, which work by gradually adding noise to an image and attempting to recreate it. Although diffusion models are computationally expensive, they can generate more realistic images than GANs and may soon find applications in super-resolution and under-sampled real-time MRI. [44,139] Another challenge of deep learning applications to MRgRT is that MRgRT is still a nascent field. For example, prostate brachytherapy often uses fiducial markers designed for CT. Fiducials show exceptional contrast in CT imaging but are difficult to see on MRI and can be challenging for segmentation methods. [215] It is likely with the maturation of the field, designed fiducial markers for MRI will see greater adoption or no longer be necessary for many applications since organ motion can be directly monitored with MRgRT. [124] In addition, high quality public datasets often remain a roadblock. How-ever, the growing number of yearly competitions like the BraTS challenge and public databases like The Cancer Imaging Archive (TCIA) have mitigated this effect.
Deep learning has also enhanced the capabilities of MRI. sCT enables MRI to generate X-ray attenuation information, super-resolution algorithms can reduce the time of acquisition or enhance clinical MRI resolution, and synthetic contrast MRI can achieve similar results to T1C MRI with T1W and T2W sequences. These are actively being researched and should improve with time. The new frontier of MRI research is the MRI-LINAC and 7T MRI. Despite better image details at higher field strengths, the ERE increases in its severity which causes unwanted dose at air-tissue interfaces. Therefore, current MRI-LINAC models operate at 0.35 and 1.5 Tesla while diagnostic MRI is commonly at 1.5 and 3 Tesla. A solution might be to synthesize low tesla MRI to higher field strengths. Similarly, 7T MRI is gaining clinical acceptance for diagnostic imaging. Despite its greater detail, it is not readily available and is more prone to artifacts associated with high strength, non-homogenous magnetic fields making it a good candidates for synthesis algorithms. [227] Synthesis of other modalities from MRI such as ultrasound, positron emission tomography (PET) imaging, and pathological images are also on the horizon as suitable datasets become available.
A prevailing theme is the cross-pollination from different disciplines. The LSTM, GRU, and transformer models were originally developed for language translation and time series estimation but are now common in radiomics, synthesis, and segmentation. The GAN is mainly implemented in synthesis but has found applications in several segmentation architectures. Similarly, super-resolution and CTand CBCT-based sMRI improve segmentation accuracy. It is foreseeable that MRI-based synthetic functional imaging like PET, single photon emission computed tomography (SPECT), and functional MRI (fMRI) could also improve radiomics performance. Reinforcement learning has found success in two stage segmentation networks by optimally adjusting the bounding box. Monte Carlo Dropout (MCDO) has been implemented in segmentation models to visualize uncertainty but could also be used to visualize synthesis uncertainty. [69] Similarly, it is common to include biometric and MRI scanner manufacturer information in radiomics, and synthesis papers have recently incorporated scanner information to enhance predictions. Clinical data commonly applied to radiomics methods like patient age, prostate specific antigen (PSA) level, and biopsy data may improve MRI synthesis and segmentation methods in the future. In addition, genomics data could refine treatment plans by predicting the radiosensitivity of the patient and tumor, and enhancing the prognostic value of radiomics methods. [113] Another source of inspiration are the sRPSP maps applied in proton therapy which bypass sCT to give accurate attenuation information of protons. It is foreseeable to also synthesize a treatment plan or dose distribution directly from MRI without the need for sCT in an MRI-only workflow. From these developments, inter-field innovation will continue to play an important role in the development of deep learning applications to MRgRT.

CONCLUSION
In summary, deep learning approaches to MRgRT represent the state-of-the-art in segmentation, synthesis, radiomics, and real-time MRI. These algorithms are expected to continue to improve rapidly and allow for precise, adaptive radiation therapy, and an MRI-only workflow.